AITopics

Country: Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)

Neural Information Processing SystemsFeb-10-2026, 20:02:04 GMT

QuantumAlgorithmsforSamplingLog-Concave DistributionsandEstimatingNormalizingConstants

Given a convex function f: Rd R, the problem of sampling from a distribution e f(x) is called log-concave sampling. This task has wide applications in machine learning, physics, statistics, etc.

algorithm, artificial intelligence, machine learning, (17 more...)

Country: Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Neural Information Processing SystemsFeb-9-2026, 12:26:52 GMT

7cc532d783a7461f227a5da8ea80bfe1-Supplemental.pdf

activation function, fully-connected neural network, manifold, (16 more...)

Country: Asia > China > Shanghai > Shanghai (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Moreno, Bianca Marin, Brégère, Margaux, Gaillard, Pierre, Oudjane, Nadia

Online Markov Decision Processes with Terminal Law Constraints

arXiv.org Machine LearningJan-13-2026

Traditional reinforcement learning usually assumes either episodic interactions with resets or continuous operation to minimize average or cumulative loss. While episodic settings have many theoretical results, resets are often unrealistic in practice. The infinite-horizon setting avoids this issue but lacks non-asymptotic guarantees in online scenarios with unknown dynamics. In this work, we move towards closing this gap by introducing a reset-free framework called the periodic framework, where the goal is to find periodic policies: policies that not only minimize cumulative loss but also return the agents to their initial state distribution after a fixed number of steps. We formalize the problem of finding optimal periodic policies and identify sufficient conditions under which it is well-defined for tabular Markov decision processes. To evaluate algorithms in this framework, we introduce the periodic regret, a measure that balances cumulative loss with the terminal law constraint. We then propose the first algorithms for computing periodic policies in two multi-agent settings and show they achieve sublinear periodic regret of order $\tilde O(T^{3/4})$. This provides the first non-asymptotic guarantees for reset-free learning in the setting of $M$ homogeneous agents, for $M > 1$.

artificial intelligence, machine learning, sequence, (18 more...)

arXiv.org Machine Learning

2601.07492

Country: Europe > France (0.46)

Genre: Research Report (0.63)

Industry:

Media > Television (0.45)
Energy > Power Industry (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Neural Information Processing SystemsNov-21-2025, 11:44:07 GMT

8ce6790cc6a94e65f17f908f462fae85-Paper.pdf

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

arXiv.org Artificial IntelligenceOct-28-2025

Psi-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models

Yoon, Taehoon, Min, Yunhong, Yeo, Kyeongmin, Sung, Minhyuk

We introduce $Ψ$-Sampler, an SMC-based framework incorporating pCNL-based initial particle sampling for effective inference-time reward alignment with a score-based generative model. Inference-time reward alignment with score-based generative models has recently gained significant traction, following a broader paradigm shift from pre-training to post-training optimization. At the core of this trend is the application of Sequential Monte Carlo (SMC) to the denoising process. However, existing methods typically initialize particles from the Gaussian prior, which inadequately captures reward-relevant regions and results in reduced sampling efficiency. We demonstrate that initializing from the reward-aware posterior significantly improves alignment performance. To enable posterior sampling in high-dimensional latent spaces, we introduce the preconditioned Crank-Nicolson Langevin (pCNL) algorithm, which combines dimension-robust proposals with gradient-informed dynamics. This approach enables efficient and scalable posterior sampling and consistently improves performance across various reward alignment tasks, including layout-to-image generation, quantity-aware generation, and aesthetic-preference generation, as demonstrated in our experiments. Project Webpage: https://psi-sampler.github.io/

large language model, machine learning, natural language, (18 more...)

2506.0132

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)

arXiv.org Artificial IntelligenceOct-2-2025

Diffusion Bridge Variational Inference for Deep Gaussian Processes

Xu, Jian, Zhao, Qibin, Paisley, John, Zeng, Delu

Deep Gaussian processes (DGPs) enable expressive hierarchical Bayesian modeling but pose substantial challenges for posterior inference, especially over inducing variables. Denoising diffusion variational inference (DDVI) addresses this by modeling the posterior as a time-reversed diffusion from a simple Gaussian prior. However, DDVI's fixed unconditional starting distribution remains far from the complex true posterior, resulting in inefficient inference trajectories and slow convergence. In this work, we propose Diffusion Bridge Variational Inference (DBVI), a principled extension of DDVI that initiates the reverse diffusion from a learnable, data-dependent initial distribution. This initialization is parameterized via an amortized neural network and progressively adapted using gradients from the ELBO objective, reducing the posterior gap and improving sample efficiency. To enable scalable amortization, we design the network to operate on the inducing inputs, which serve as structured, low-dimensional summaries of the dataset and naturally align with the inducing variables' shape. DBVI retains the mathematical elegance of DDVI, including Girsanov-based ELBOs and reverse-time SDEs,while reinterpreting the prior via a Doob-bridged diffusion process. We derive a tractable training objective under this formulation and implement DBVI for scalable inference in large-scale DGPs. Across regression, classification, and image reconstruction tasks, DBVI consistently outperforms DDVI and other variational baselines in predictive accuracy, convergence speed, and posterior quality.

artificial intelligence, bayesian inference, machine learning, (17 more...)

2509.19078

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Tercan, Alperen, Ozay, Necmiye

Initial Distribution Sensitivity of Constrained Markov Decision Processes

arXiv.org Artificial IntelligenceOct-2-2025

Constrained Markov Decision Processes (CMDPs) are notably more complex to solve than standard MDPs due to the absence of universally optimal policies across all initial state distributions. This necessitates re-solving the CMDP whenever the initial distribution changes. In this work, we analyze how the optimal value of CMDPs varies with different initial distributions, deriving bounds on these variations using duality analysis of CMDPs and perturbation analysis in linear programming. Moreover, we show how such bounds can be used to analyze the regret of a given policy due to unknown variations of the initial distribution.

artificial intelligence, machine learning, optimization problem, (17 more...)

2510.00348

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.62)

Benati, Matteo, Londei, Alessandro, Lanzieri, Denise, Loreto, Vittorio

First-Extinction Law for Resampling Processes

arXiv.org Machine LearningSep-25-2025

Extinction times in resampling processes are fundamental yet often intractable, as previous formulas scale as $2^M$ with the number of states $M$ present in the initial probability distribution. We solve this by treating multinomial updates as independent square-root diffusions of zero drift, yielding a closed-form law for the first-extinction time. We prove that the mean coincides exactly with the Wright-Fisher result of Baxter et al., thereby replacing exponential-cost evaluations with a linear-cost expression, and we validate this result through extensive simulations. Finally, we demonstrate predictive power for model collapse in a simple self-training setup: the onset of collapse coincides with the resampling-driven first-extinction time computed from the model's initial stationary distribution. These results hint to a unified view of resampling extinction dynamics.

first-extinction time, formula, simulation, (14 more...)

arXiv.org Machine Learning

2509.20101

Country:

Europe > Austria > Vienna (0.14)
Europe > Italy > Lazio > Rome (0.04)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Banking & Finance (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

arXiv.org Artificial IntelligenceSep-4-2025

Population-aware Online Mirror Descent for Mean-Field Games with Common Noise by Deep Reinforcement Learning

Wu, Zida, Lauriere, Mathieu, Geist, Matthieu, Pietquin, Olivier, Mehta, Ankur

Mean Field Games (MFGs) offer a powerful framework for studying large-scale multi-agent systems. Yet, learning Nash equilibria in MFGs remains a challenging problem, particularly when the initial distribution is unknown or when the population is subject to common noise. In this paper, we introduce an efficient deep reinforcement learning (DRL) algorithm designed to achieve population-dependent Nash equilibria without relying on averaging or historical sampling, inspired by Munchausen RL and Online Mirror Descent. The resulting policy is adaptable to various initial distributions and sources of common noise. Through numerical experiments on seven canonical examples, we demonstrate that our algorithm exhibits superior convergence properties compared to state-of-the-art algorithms, particularly a DRL version of Fictitious Play for population-dependent policies. The performance in the presence of common noise underscores the robustness and adaptability of our approach.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2509.0303

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)